Discovering and Comparing Topic Hierarchies
نویسندگان
چکیده
Hierarchies have been used for organization, summarization, and access to information, yet a lingering issue is how best to construct them. In this paper, our goal is to automatically create domain specific hierarchies that can be used for browsing a document set and locating relevant documents. We examine methods of automatically generating hierarchies and evaluating them. To this end, we compare and contrast two methods of generating topic hierarchies from the text of documents: one, subsumption hierarchies, uses subsumption relations found within document sets, and the other, lexical hierarchies, utilizes frequently used words within phrases. Our evaluation shows that subsumption hierarchies divide documents into smaller groups, allowing one to find all relevant documents without looking at as many non-relevant documents. However, such hierarchies are more likely to contain no path to a relevant document.
منابع مشابه
Discovering and Comparing Topic Hierarchies: Master’s Project
DISCOVERING AND COMPARING TOPIC HIERARCHIES: MASTER’S PROJECT
متن کاملA review of text mining approaches and their function in discovering and extracting a topic
Background and aim: Four text mining methods are examined and focused on understanding and identifying their properties and limitations in subject discovery. Methodology: The study is an analytical review of the literature of text mining and topic modeling. Findings: LSA could be used to classify specific and unique topics in documents that address only a single topic. The other three text min...
متن کاملUsing Topic Modelling Algorithms for Hierarchical Activity Discovery
Activity discovery is the unsupervised process of discovering patterns in data produced from sensor networks that are monitoring the behaviour of human subjects. Improvements in activity discovery may simplify the training of activity recognition models by enabling the automated annotation of datasets and also the construction of systems that can detect and highlight deviations from normal beha...
متن کاملDiscovering Non-binary Hierarchical Structures with Bayesian Rose Trees
Rich hierarchical structures are common across many disciplines, making the discovery of hierarchies a fundamental exploratory data analysis and unsupervised learning problem. Applications with natural hierarchical structure include topic hierarchies in text (Blei et al. 2010), phylogenies in evolutionary biology (Felsenstein 2003), hierarchical community structures in social networks (Girvan a...
متن کاملCross-Collection Topic Models: Automatically Comparing and Contrasting Text
This paper describes cross-collection latent Dirichlet allocation (ccLDA), a probabilistic topic model that captures meaningful word co-occurrences across multiple text collections. The model is applied to three different applications: discovering cultural differences in blogs and forums from different countries, discovering research topics across multiple scientific disciplines, and comparing ...
متن کامل